25 research outputs found
MODBASE, a database of annotated comparative protein structure models and associated resources.
MODBASE (http://salilab.org/modbase) is a database of annotated comparative protein structure models. The models are calculated by MODPIPE, an automated modeling pipeline that relies primarily on MODELLER for fold assignment, sequence-structure alignment, model building and model assessment (http:/salilab.org/modeller). MODBASE currently contains 5,152,695 reliable models for domains in 1,593,209 unique protein sequences; only models based on statistically significant alignments and/or models assessed to have the correct fold are included. MODBASE also allows users to calculate comparative models on demand, through an interface to the MODWEB modeling server (http://salilab.org/modweb). Other resources integrated with MODBASE include databases of multiple protein structure alignments (DBAli), structurally defined ligand binding sites (LIGBASE), predicted ligand binding sites (AnnoLyze), structurally defined binary domain interfaces (PIBASE) and annotated single nucleotide polymorphisms and somatic mutations found in human proteins (LS-SNP, LS-Mut). MODBASE models are also available through the Protein Model Portal (http://www.proteinmodelportal.org/)
MODBASE: a database of annotated comparative protein structure models and associated resources
MODBASE () is a database of annotated comparative protein structure models for all available protein sequences that can be matched to at least one known protein structure. The models are calculated by MODPIPE, an automated modeling pipeline that relies on MODELLER for fold assignment, sequence–structure alignment, model building and model assessment (). MODBASE is updated regularly to reflect the growth in protein sequence and structure databases, and improvements in the software for calculating the models. MODBASE currently contains 3 094 524 reliable models for domains in 1 094 750 out of 1 817 889 unique protein sequences in the UniProt database (July 5, 2005); only models based on statistically significant alignments and models assessed to have the correct fold despite insignificant alignments are included. MODBASE also allows users to generate comparative models for proteins of interest with the automated modeling server MODWEB (). Our other resources integrated with MODBASE include comprehensive databases of multiple protein structure alignments (DBAli, ), structurally defined ligand binding sites and structurally defined binary domain interfaces (PIBASE, ) as well as predictions of ligand binding sites, interactions between yeast proteins, and functional consequences of human nsSNPs (LS-SNP, )
GeMMA: functional subfamily classification within superfamilies of predicted protein structural domains
GeMMA (Genome Modelling and Model Annotation) is a new approach to automatic functional subfamily classification within families and superfamilies of protein sequences. A major advantage of GeMMA is its ability to subclassify very large and diverse superfamilies with tens of thousands of members, without the need for an initial multiple sequence alignment. Its performance is shown to be comparable to the established high-performance method SCI-PHY. GeMMA follows an agglomerative clustering protocol that uses existing software for sensitive and accurate multiple sequence alignment and profile–profile comparison. The produced subfamilies are shown to be equivalent in quality whether whole protein sequences are used or just the sequences of component predicted structural domains. A faster, heuristic version of GeMMA that also uses distributed computing is shown to maintain the performance levels of the original implementation. The use of GeMMA to increase the functional annotation coverage of functionally diverse Pfam families is demonstrated. It is further shown how GeMMA clusters can help to predict the impact of experimentally determining a protein domain structure on comparative protein modelling coverage, in the context of structural genomics
Regulatory Elements within the Prodomain of Falcipain-2, a Cysteine Protease of the Malaria Parasite Plasmodium falciparum
Falcipain-2, a papain family cysteine protease of the malaria parasite Plasmodium falciparum, plays a key role in parasite hydrolysis of hemoglobin and is a potential chemotherapeutic target. As with many proteases, falcipain-2 is synthesized as a zymogen, and the prodomain inhibits activity of the mature enzyme. To investigate the mechanism of regulation of falcipain-2 by its prodomain, we expressed constructs encoding different portions of the prodomain and tested their ability to inhibit recombinant mature falcipain-2. We identified a C-terminal segment (Leu155–Asp243) of the prodomain, including two motifs (ERFNIN and GNFD) that are conserved in cathepsin L sub-family papain family proteases, as the mediator of prodomain inhibitory activity. Circular dichroism analysis showed that the prodomain including the C-terminal segment, but not constructs lacking this segment, was rich in secondary structure, suggesting that the segment plays a crucial role in protein folding. The falcipain-2 prodomain also efficiently inhibited other papain family proteases, including cathepsin K, cathepsin L, cathepsin B, and cruzain, but it did not inhibit cathepsin C or tested proteases of other classes. A structural model of pro-falcipain-2 was constructed by homology modeling based on crystallographic structures of mature falcipain-2, procathepsin K, procathepsin L, and procaricain, offering insights into the nature of the interaction between the prodomain and mature domain of falcipain-2 as well as into the broad specificity of inhibitory activity of the falcipain-2 prodomain
Using neural networks and evolutionary information in decoy discrimination for protein tertiary structure prediction
Background: We present a novel method of protein fold decoy discrimination using machine learning, more specifically using neural networks. Here, decoy discrimination is represented as a machine learning problem, where neural networks are used to learn the native-like features of protein structures using a set of positive and negative training examples. A set of native protein structures provides the positive training examples, while negative training examples are simulated decoy structures obtained by reversing the sequences of native structures. Various features are extracted from the training dataset of positive and negative examples and used as inputs to the neural networks.Results: Results have shown that the best performing neural network is the one that uses input information comprising of PSI-BLAST [1] profiles of residue pairs, pairwise distance and the relative solvent accessibilities of the residues. This neural network is the best among all methods tested in discriminating the native structure from a set of decoys for all decoy datasets tested. Conclusion: This method is demonstrated to be viable, and furthermore evolutionary information is successfully used in the neural networks to improve decoy discrimination
Trends in template/fragment-free protein structure prediction
Predicting the structure of a protein from its amino acid sequence is a long-standing unsolved problem in computational biology. Its solution would be of both fundamental and practical importance as the gap between the number of known sequences and the number of experimentally solved structures widens rapidly. Currently, the most successful approaches are based on fragment/template reassembly. Lacking progress in template-free structure prediction calls for novel ideas and approaches. This article reviews trends in the development of physical and specific knowledge-based energy functions as well as sampling techniques for fragment-free structure prediction. Recent physical- and knowledge-based studies demonstrated that it is possible to sample and predict highly accurate protein structures without borrowing native fragments from known protein structures. These emerging approaches with fully flexible sampling have the potential to move the field forward
Recommended from our members
ASSESSMENT AND PREDICTION OF PROTEIN STRUCTURES
An ambitious goal of modern biology is to understand the structure(s), interaction(s) and function(s) of each protein within cells and organisms. Understanding the nature of the interactions a protein makes is important because no protein exists in isolation, but rather functions through interactions with other macromolecules. Knowledge about the function of proteins is essential to understanding biological processes. Structure is the unifying component: both interactions and functions are intrinsically related to structure, as the structure of a protein helps define its function and affects the nature, type, and number of interactions it has with other macromolecules. Great attention has been paid to the development of methods for both the theoretical prediction and experimental determination of protein structure. Though experimentally-derived structures are more accurate, they are relatively scarce: of the millions of known protein sequences, well fewer than 1% of their corresponding structures have been solved experimentally. In the absence of an experimentally determined structure, computational models are often valuable for generating testable hypotheses and giving insight into existing experimental data. Such computational structure models are available for over two orders of magnitude more protein sequences than are experimentally determined structures, yet suffer from two limitations that experimentally determined structures do not: they frequently contain significant errors, and their accuracy cannot be readily assessed. The research described herein sought to increase the accuracy and applicability of computational protein models by addressing these two limitations. This broad goal was approached in four principal ways: (1) identifying the most native-like models from among sets of similar models; (2) predicting the absolute accuracy of protein structure models; (3) improving the accuracy of target/template alignments to increase the accuracy of comparative models built from distantly related template structures; and (4) developing a unified protein structure prediction protocol that makes the best use of all available information about the structure of a given protein, regardless of whether it is directly based on experiment, on the broader knowledge base, on statistical potentials, or intuition
Feature-rich distance-based terrain synthesis
This paper describes a novel terrain synthesis method based on distances in a weighted graph. A height field is determined by least-cost paths in a weighted graph from a set of generator nodes. The shapes of individual terrain features, such as mountains, hills, and craters, are specified by a monotonically decreasing profile describing the cross-sectional shape of a feature. The locations of features in the terrain are specified by placing the generators; secondary ridges are placed by pathing. We show the method to be robust and easy to control, even making it possible to embed images in terrain shadows. The method can produce a wide range of realistic synthetic terrains such as mountain ranges, craters, cinder cones, and hills. The ability to manually place terrain features that incorporate multiple profiles produces heterogeneous terrains that compare favorably to existing methods